
Progress meeting 20.04.2024
Dataset overview
Variable cleaning
Principal Component Analysis
K-means Clustering
Hidden-Markov-Modeling
Next steps






Dataset containing only the normalized variables for the land use and lateral continuity (area normalized to the valley bottom area)
Based on high values of correlation and similarities in the PCA, the following variables are removed:
floodplain_slope as it is represented well by talweg_slopegravel_bars_pc as it is represented well in active_channel_pcwater_channel_width as it is represented well in active_channel_widthvalley_bottom_width by sum_areasemi_natural_pc as it is falsely calculated and only represents grassland_pcreversible_pc as as it is falsely calculated and only represents grassland_pc and crops_pcinfrastructures_pc, dense_urban_pc, and diffuse_urban_pc are well represented by built_environment_pcnatural_corridor_width represented well by connected_corridor_width









According to the results, the first four principal components are sufficient to represent 64.5 % of the variability of the data set. In the following, each of these PCs is analysed according to the individual association of the variables to them in order to facilitate interpretation.
| PC | Description |
|---|---|
| PC1 | Positive values indicate large rivers in wide valleys with low slopes and low elevations, with comparably small riparian corridor and diverse anthropogenic activity in the adjacent areas. Negative values indicate smaller rivers in narrow valleys with higher slopes and elevations, with a greater relative area for the riparian corridor and less activity in the adjacent areas.
|
| PC2 | Positive values indicate rather narrow valleys in which most of the space is taken by the water channel with few space for the connected corridor and crops. Negative values indicate wide valleys with smaller channel width to valley width ratios and larger shares of connected corridor and crops.
|
| PC3 | Positive values indicate comparably large and forested riparian corridors in lower elevations with few grassland and natural open area. Negative values thus indicate comparably small and unforested riparian corridors in higher elevations and with more natural open areas and grasslands.
|
| PC4 | Positive values indicate rather smaller, confined streams with a strong presence of anthropogenic infrastructure. Negative values thus indicate comparably larger rivers with more space for the active channel and no presence of built/anthropogenic infrastructure in the adjacent zones.
|

















K-means is a clustering method that generates clusters based on the search for centers of gravity to which the mean distance from the associated data points is minimized. In order to apply this method, the number of clusters must first be determined. For this purpose, 24 different indices were evaluated using the NBClust-package. Among all indices:
According to the majority rule, the best number of clusters is 5.




Based on the data-distributions, the main characteristics of the clusters are summarized in the following table:
| Cluster | Derived characteristics |
|---|---|
| 1 | Rivers confined by anthropogenized floodplain Rather confined, lower elevation rivers with altered riparian zone including diverse usages such as urban and agricultural infrastructure.
|
| 2 | Larger rivers with agricultural landscape Larger rivers in wide valleys with low slopes and low elevations, with semi-intensive riparian corridor use due to agricultural activity.
|
| 3 | Small upstream rivers Smaller and unforested riparian corridors in higher elevations and with more natural open areas and grasslands and less activity in the adjacent areas.
|
| 4 | Forested medium-sized rivers Large and forested riparian corridors in lower elevations with few grassland and natural open area.
|
| 5 | Diverse medium-sized and large rivers Medium-sized and larger streams in lower elevations with different landuse patterns and active channel sizes.
|

3-state HMM applied to the cluster series of the Isère River. Modeling is done via the HMM-package, using the Baum-Welch algorithm to fit the model and the Viterbi algorithm to compute most probable path of states.


(ongoing) literature review: most common ways / important variables to characterize the physical properties of river networks and segments
HMM:
geographic tree structure
multivariate HMM
further classification methods:
Main goal of work? - longitudinal characterisation and segment classification of water courses
Which result is expected? (exploratory analysis, package, functional extension of app, …)